NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Sora: A Latency Sensitive Approach for Microservice Soft Resource Adaption

Liu, Jianshu; Wang, Qingyang; Liu, Jianshu; Zhang, Shungeng; Hu, Liting; Silva, Dilma Da (December 2023, 24th ACM/IFIP International Middleware Conference)

Full Text Available
A BlackBox Approach to Profile Runtime Execution Dependencies in Microservices

https://doi.org/10.1109/CIC58953.2023.00024

Gu, Xuhang; Liu, Jianshu; Wang, Qingyang (November 2023, IEEE 9th International Conference on Collaboration and Internet Computing (CIC))

Loosely-coupled and lightweight microservices running in containers are likely to form complex execution dependencies inside the system. The execution dependency arises when two execution paths partially share component microservices, resulting in potential runtime performance interference. In this paper, we present a blackbox approach that utilizes legitimate HTTP requests to accurately profile the internal pairwise dependencies of all supported execution paths in the target microservices application. Concretely, we profile the pairwise dependency of two execution paths through performance interference analysis by sending bursts of two types of requests simultaneously. By characterizing and grouping all the execution paths based on their pairwise dependencies, the blackbox approach can derive a clear dependency graph(s) of the entire backend of the microservices application. We validate the effectiveness of the blackbox approach through experiments of open-source microservices benchmark applications running on real clouds (e.g., EC2, Azure).
more » « less
Full Text Available
μConAdapter: Reinforcement Learning-based Fast Concurrency Adaptation for Microservices in Cloud

https://doi.org/10.1145/3620678.3624980

Liu, Jianshu; Zhang, Shungeng; Wang, Qingyang (October 2023, Proceedings of the 2023 ACM Symposium on Cloud Computing (SoCC '23))

Full Text Available
Sora: A Latency Sensitive Approach for Microservice Soft Resource Adaptation

https://doi.org/10.1145/3590140.3592851

Liu, Jianshu; Wang, Qingyang; Zhang, Shungeng; Hu, Liting; Da Silva, Dilma (November 2023, Proceedings of the 24th International Middleware Conference (Middleware '23))

Full Text Available
Coordinating Fast Concurrency Adapting With Autoscaling for SLO-Oriented Web Applications

https://doi.org/10.1109/TPDS.2022.3151512

Liu, Jianshu; Zhang, Shungeng; Wang, Qingyang; Wei, Jinpeng (December 2022, IEEE Transactions on Parallel and Distributed Systems)

Full Text Available
ShadowSync: latency long tail caused by hidden synchronization in real-time LSM-tree based stream processing systems

https://doi.org/10.1145/3528535.3565251

Zhang, Shungeng; Wang, Qingyang; Kanemasa, Yasuhiko; Michaelis, Julius; Liu, Jianshu; Pu, Calton (November 2022, Middleware '22: Proceedings of the 23rd ACM/IFIP International Middleware Conference)

Mission-critical, real-time, continuous stream processing applications that interact with the real world have stringent latency requirements. For example, e-commerce websites like Amazon improve their marketing strategy by performing real-time advertising based on customers' behavior, and latency long tail can cause significant revenue loss. Recent work [39] showed a positive correlation between latency long tail and variance in the execution time of synchronous invocation chains (critical paths) in microservices benchmarks. This paper shows that asynchronous, very short but intense resource demands (called millibottlenecks) outside of critical paths can also cause significant latency long tail. Using a traffic analysis stream processing application benchmark, we evaluated the impact of asynchronous workload bursts generated by a multi-layer data structure called LSM-tree (log-structured merge-tree) for continuous checkpointing. Outside of the critical path, LSM-tree relies on maintenance operations (e.g., flushing/compaction during a checkpoint) to reorganize LSM-tree in memory and on disk to keep data access latency short. Although asynchronous, such recurrent maintenance operations can cause frequent millibottlenecks, particularly when they overlap, a problem we call ShadowSync. For scheduling and statistical reasons, significant latency long tail can arise from ShadowSync caused by asynchronous recurrent operations. Our experimental results show that with typical settings of benchmark components such as RocksDB, ShadowSync can prolong request message latency by up to 2 seconds. We show effective mitigation methods can alleviate both scheduled and statistical ShadowSync reducing the latency long tail to less than 20% of the original at the 99.9th percentile.
more » « less
Full Text Available
DoubleFaceAD: A New Datastore Driver Architecture to Optimize Fanout Query Performance

https://doi.org/10.1145/3423211.3425684

Zhang, Shungeng; Wang, Qingyang; Kanemasa, Yasuhiko; Liu, Jianshu; Pu, Calton (December 2020, Proceedings of the 21st International Middleware Conference)
null (Ed.)
The broad adoption of fanout queries on distributed datastores has made asynchronous event-driven datastore drivers a natural choice due to reduced multithreading overhead. However, through extensive experiments using the latest datastore drivers (e.g., MongoDB, HBase, DynamoDB) and YCSB benchmark, we show that an asynchronous datastore driver can cause unexpected performance degradation, especially in fanout-query scenarios. For example, the default MongoDB asynchronous driver adopts the latest Java asynchronous I/O library, which uses a hidden on-demand JVM level thread pool to process fanout query responses, causing a surprising multithreading overhead when the query response size is large. A second instance is the traditional wisdom of modular design of an application server and the embedded asynchronous datastore driver can cause an imbalanced workload between the two components due to lack of coordination, incurring frequent unnecessary system calls. To address the revealed problems, we introduce DoubleFaceAD--a new asynchronous datastore driver architecture that integrates the management of both upstream and downstream workload traffic through a few shared reactor threads, with fanout-query-aware priority-based scheduling to reduce the overall query waiting time. Our experimental results on two representative application scenarios (YCSB and DBLP) show DoubleFaceAD outperforms all other types of datastore drivers up to 34% on throughput and 1.9\texttimes{} faster on 99th percentile response time.
more » « less
Full Text Available
Mitigating Large Response Time Fluctuations through Fast Concurrency Adapting in Clouds

https://doi.org/10.1109/IPDPS47924.2020.00046

Liu, Jianshu; Zhang, Shungeng; Wang, Qingyang; Wei, Jinpeng (May 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS))
null (Ed.)
Dynamically reallocating computing resources to handle bursty workloads is a common practice for web applications (e.g., e-commerce) in clouds. However, our empirical analysis on a standard n-tier benchmark application (RUBBoS) shows that simply scaling an n-tier application by reallocating hardware resources without fast adapting soft resources (e.g., server threads, connections) may lead to large response time fluctuations. This is because soft resources control the workload concurrency of component servers in the system: adding or removing hardware resources such as Virtual Machines (VMs) can implicitly change the workload concurrency of dependent servers, causing either under- or over-utilization of the critical hardware resource in the system. To quickly identify the optimal soft resource allocation of each server in the system and stabilize response time fluctuation, we propose a novel Scatter-Concurrency-Throughput (SCT) model based on the monitoring of each server's real-time concurrency and throughput. We then implement a Concurrency-aware system Scaling (ConScale) framework which integrates the SCT model to fast adapt the soft resource allocations of key servers during the system scaling process. Our experiments using six realistic bursty workload traces show that ConScale can effectively mitigate the response time fluctuations of the target web application compared to the state-of-the-art cloud scaling strategies such as EC2-AutoScaling.
more » « less
Full Text Available

Search for: All records